An approximation to the greedy algorithm for differential compression
نویسندگان
چکیده
We present a new differential compression algorithm that combines the hash value techniques and suffix array techniques of previous work. The term ‘‘differential compression’’ refers to encoding a file (a version file) as a set of changes with respect to another file (a reference file). Previous differential compression algorithms can be shown empirically to run in linear time, but they have certain drawbacks; namely, they do not find the best matches for every offset of the version file. Our algorithm, hsadelta (hash suffix array delta), finds the best matches for every offset of the version file, with respect to a certain granularity and above a certain length threshold. The algorithm has two variations depending on how we choose the block size. We show that if the block size is kept fixed, the compression performance of the algorithm is similar to that of the greedy algorithm, without the associated expensive space and time requirements. If the block size is varied linearly with the reference file size, the algorithm can run in linear time and constant space. We also show empirically that the algorithm performs better than other state-of-the-art differential compression algorithms in terms of compression and is comparable in speed.
منابع مشابه
Convergence Rates of the Pod-greedy Method
Iterative approximation algorithms are successfully applied in parametric approximation tasks. In particular, reduced basis methods make use of the so called Greedy algorithm for approximating solution sets of parametrized partial differential equations. Recently, a-priori convergence rate statements for this algorithm have been given (Buffa et al 2009, Binev et al. 2010). The goal of the curre...
متن کاملAn Iterated Greedy Algorithm for Flexible Flow Lines with Sequence Dependent Setup Times to Minimize Total Weighted Completion Time
This paper explores the flexile flow lines where setup times are sequence- dependent. The optimization criterion is the minimization of total weighted completion time. We propose an iterated greedy algorithm (IGA) to tackle the problem. An experimental evaluation is conducted to evaluate the proposed algorithm and, then, the obtained results of IGA are compared against those of some other exist...
متن کاملAn Iterated Greedy Algorithm for Solving the Blocking Flow Shop Scheduling Problem with Total Flow Time Criteria
In this paper, we propose an iterated greedy algorithm for solving the blocking flow shop scheduling problem with total flow time minimization objective. The steps of this algorithm are designed very efficient. For generating an initial solution, we develop an efficient constructive heuristic by modifying the best known NEH algorithm. Effectiveness of the proposed iterated greedy algorithm is t...
متن کاملParallel and Sequential Approximations of Shortest Superstrings
Superstrings have many applications in data compression and genetics. However the decision version of the shortest superstring problem is NP-complete. In this paper we examine the complexity of approximating a shortest superstring. There are two basic measures of the approximations: the compression ratio and the approximation ratio. The well known and practical approximation algorithm is the se...
متن کاملDetecting communities of workforces for the multi-skill resource-constrained project scheduling problem: A dandelion solution approach
This paper proposes a new mixed-integer model for the multi-skill resource-constrained project scheduling problem (MSRCPSP). The interactions between workers are represented as undirected networks. Therefore, for each required skill, an undirected network is formed which shows the relations of human resources. In this paper, community detection in networks is used to find the most compatible wo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IBM Journal of Research and Development
دوره 50 شماره
صفحات -
تاریخ انتشار 2006